Overview

Dataset statistics

Number of variables14
Number of observations256782
Missing cells40320
Missing cells (%)1.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory27.4 MiB
Average record size in memory112.0 B

Variable types

NUM9
CAT3
BOOL1
DATE1

Warnings

VERSIE has constant value "256782" Constant
DATUM_BESTAND has constant value "256782" Constant
PEILDATUM has constant value "256782" Constant
TYPERENDE_DIAGNOSE_CD has a high cardinality: 1766 distinct values High cardinality
AANTAL_SUBTRAJECT_PER_ZPD is highly correlated with AANTAL_PAT_PER_ZPDHigh correlation
AANTAL_PAT_PER_ZPD is highly correlated with AANTAL_SUBTRAJECT_PER_ZPDHigh correlation
AANTAL_SUBTRAJECT_PER_DIAG is highly correlated with AANTAL_PAT_PER_DIAGHigh correlation
AANTAL_PAT_PER_DIAG is highly correlated with AANTAL_SUBTRAJECT_PER_DIAGHigh correlation
AANTAL_SUBTRAJECT_PER_SPC is highly correlated with AANTAL_PAT_PER_SPCHigh correlation
AANTAL_PAT_PER_SPC is highly correlated with AANTAL_SUBTRAJECT_PER_SPCHigh correlation
GEMIDDELDE_VERKOOPPRIJS has 40320 (15.7%) missing values Missing
AANTAL_SUBTRAJECT_PER_ZPD is highly skewed (γ1 = 21.28466669) Skewed

Reproduction

Analysis started2020-10-25 20:13:08.808320
Analysis finished2020-10-25 20:13:38.764423
Duration29.96 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

VERSIE
Boolean

CONSTANT
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.0 MiB
1
256782 
ValueCountFrequency (%) 
1256782100.0%
 
2020-10-25T20:13:38.788154image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

DATUM_BESTAND
Categorical

CONSTANT
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.0 MiB
2020-10-11
256782 
ValueCountFrequency (%) 
2020-10-11256782100.0%
 
2020-10-25T20:13:38.887621image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-10-25T20:13:38.991109image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:39.091749image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length10
Median length10
Mean length10
Min length10

PEILDATUM
Categorical

CONSTANT
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.0 MiB
2020-10-01
256782 
ValueCountFrequency (%) 
2020-10-01256782100.0%
 
2020-10-25T20:13:39.247113image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-10-25T20:13:39.352981image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:39.448334image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length10
Median length10
Mean length10
Min length10

JAAR
Date

Distinct9
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.0 MiB
Minimum2012-01-01 00:00:00
Maximum2020-01-01 00:00:00
2020-10-25T20:13:39.574888image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:39.729926image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)

BEHANDELEND_SPECIALISME_CD
Real number (ℝ≥0)

Distinct27
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean421.9037316
Minimum301
Maximum8418
Zeros0
Zeros (%)0.0%
Memory size2.0 MiB
2020-10-25T20:13:39.906010image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum301
5-th percentile302
Q1305
median313
Q3322
95-th percentile335
Maximum8418
Range8117
Interquartile range (IQR)17

Descriptive statistics

Standard deviation921.5860959
Coefficient of variation (CV)2.184351611
Kurtosis71.13989567
Mean421.9037316
Median Absolute Deviation (MAD)8
Skewness8.545382675
Sum108337284
Variance849320.9321
MonotocityNot monotonic
2020-10-25T20:13:40.093683image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=27)
ValueCountFrequency (%) 
3053631814.1%
 
3133331713.0%
 
3032947811.5%
 
330205708.0%
 
316175506.8%
 
308131935.1%
 
306107164.2%
 
324106844.2%
 
301104044.1%
 
30483913.3%
 
Other values (17)6616125.8%
 
ValueCountFrequency (%) 
301104044.1%
 
30255852.2%
 
3032947811.5%
 
30483913.3%
 
3053631814.1%
 
ValueCountFrequency (%) 
841833591.3%
 
19001680.1%
 
3906630.3%
 
38927771.1%
 
36238901.5%
 

TYPERENDE_DIAGNOSE_CD
Categorical

HIGH CARDINALITY

Distinct1766
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size2.0 MiB
101
 
1074
402
 
1052
301
 
1025
403
 
1021
203
 
966
Other values (1761)
251644 
ValueCountFrequency (%) 
10110740.4%
 
40210520.4%
 
30110250.4%
 
40310210.4%
 
2039660.4%
 
2019590.4%
 
4018620.3%
 
4048500.3%
 
8028410.3%
 
4098330.3%
 
Other values (1756)24729996.3%
 
2020-10-25T20:13:40.344133image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique3 ?
Unique (%)< 0.1%
2020-10-25T20:13:40.569775image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length4
Median length3
Mean length3.349814239
Min length2

ZORGPRODUCT_CD
Real number (ℝ≥0)

Distinct5891
Distinct (%)2.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean440748534.1
Minimum10501002
Maximum998418081
Zeros0
Zeros (%)0.0%
Memory size2.0 MiB
2020-10-25T20:13:40.796105image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum10501002
5-th percentile28999036
Q199799032
median149599026
Q3990004004
95-th percentile990416049.9
Maximum998418081
Range987917079
Interquartile range (IQR)890204972

Descriptive statistics

Standard deviation429110698.5
Coefficient of variation (CV)0.9735952937
Kurtosis-1.737371851
Mean440748534.1
Median Absolute Deviation (MAD)119600023
Skewness0.4676016897
Sum1.131762901e+14
Variance1.841359916e+17
MonotocityNot monotonic
2020-10-25T20:13:41.027108image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
99000400918930.7%
 
99000400718590.7%
 
99000300418200.7%
 
99000400614770.6%
 
99035607613260.5%
 
99035607312270.5%
 
99000300711770.5%
 
13199922811360.4%
 
13199916411220.4%
 
19929901310730.4%
 
Other values (5881)24267294.5%
 
ValueCountFrequency (%) 
105010026< 0.1%
 
105010039< 0.1%
 
105010049< 0.1%
 
105010059< 0.1%
 
105010073< 0.1%
 
ValueCountFrequency (%) 
998418081123< 0.1%
 
998418080105< 0.1%
 
99841807929< 0.1%
 
9984180776< 0.1%
 
9984180766< 0.1%
 

AANTAL_PAT_PER_ZPD
Real number (ℝ≥0)

HIGH CORRELATION

Distinct8772
Distinct (%)3.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean497.7230764
Minimum1
Maximum154283
Zeros0
Zeros (%)0.0%
Memory size2.0 MiB
2020-10-25T20:13:41.260826image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median13
Q3100
95-th percentile1675
Maximum154283
Range154282
Interquartile range (IQR)97

Descriptive statistics

Standard deviation3099.267383
Coefficient of variation (CV)6.226891077
Kurtosis399.6134163
Mean497.7230764
Median Absolute Deviation (MAD)12
Skewness16.68000941
Sum127806327
Variance9605458.312
MonotocityNot monotonic
2020-10-25T20:13:41.602763image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
14262216.6%
 
2209498.2%
 
3135825.3%
 
4101634.0%
 
578273.0%
 
665442.5%
 
754712.1%
 
846321.8%
 
942721.7%
 
1037241.5%
 
Other values (8762)13699653.4%
 
ValueCountFrequency (%) 
14262216.6%
 
2209498.2%
 
3135825.3%
 
4101634.0%
 
578273.0%
 
ValueCountFrequency (%) 
1542831< 0.1%
 
1539071< 0.1%
 
1514291< 0.1%
 
1447231< 0.1%
 
1131961< 0.1%
 

AANTAL_SUBTRAJECT_PER_ZPD
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct9409
Distinct (%)3.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean580.6849273
Minimum1
Maximum239907
Zeros0
Zeros (%)0.0%
Memory size2.0 MiB
2020-10-25T20:13:41.844509image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median14
Q3108
95-th percentile1886
Maximum239907
Range239906
Interquartile range (IQR)105

Descriptive statistics

Standard deviation3917.510279
Coefficient of variation (CV)6.746361227
Kurtosis725.1548283
Mean580.6849273
Median Absolute Deviation (MAD)13
Skewness21.28466669
Sum149109437
Variance15346886.78
MonotocityNot monotonic
2020-10-25T20:13:42.070115image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
14115116.0%
 
2206008.0%
 
3134385.2%
 
4100063.9%
 
577463.0%
 
665312.5%
 
754722.1%
 
845851.8%
 
941841.6%
 
1037711.5%
 
Other values (9399)13929854.2%
 
ValueCountFrequency (%) 
14115116.0%
 
2206008.0%
 
3134385.2%
 
4100063.9%
 
577463.0%
 
ValueCountFrequency (%) 
2399071< 0.1%
 
2324841< 0.1%
 
2313181< 0.1%
 
2276641< 0.1%
 
2213641< 0.1%
 

AANTAL_PAT_PER_DIAG
Real number (ℝ≥0)

HIGH CORRELATION

Distinct7695
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7519.273189
Minimum1
Maximum211638
Zeros0
Zeros (%)0.0%
Memory size2.0 MiB
2020-10-25T20:13:42.306964image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile41
Q1393
median1655
Q36171
95-th percentile36005
Maximum211638
Range211637
Interquartile range (IQR)5778

Descriptive statistics

Standard deviation17551.14893
Coefficient of variation (CV)2.334154976
Kurtosis33.53386049
Mean7519.273189
Median Absolute Deviation (MAD)1506
Skewness5.052410862
Sum1930814008
Variance308042828.7
MonotocityNot monotonic
2020-10-25T20:13:42.520242image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
214520.2%
 
263850.1%
 
113780.1%
 
253690.1%
 
373680.1%
 
193620.1%
 
63590.1%
 
143580.1%
 
203580.1%
 
173570.1%
 
Other values (7685)25303698.5%
 
ValueCountFrequency (%) 
12850.1%
 
23100.1%
 
32910.1%
 
43370.1%
 
52850.1%
 
ValueCountFrequency (%) 
21163825< 0.1%
 
21108923< 0.1%
 
20986419< 0.1%
 
20639417< 0.1%
 
20370917< 0.1%
 

AANTAL_SUBTRAJECT_PER_DIAG
Real number (ℝ≥0)

HIGH CORRELATION

Distinct8509
Distinct (%)3.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10571.53519
Minimum1
Maximum344226
Zeros0
Zeros (%)0.0%
Memory size2.0 MiB
2020-10-25T20:13:42.765503image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile50
Q1508
median2257
Q38644
95-th percentile50725
Maximum344226
Range344225
Interquartile range (IQR)8136

Descriptive statistics

Standard deviation25534.33712
Coefficient of variation (CV)2.415385908
Kurtosis37.938778
Mean10571.53519
Median Absolute Deviation (MAD)2070
Skewness5.338831566
Sum2714579949
Variance652002372.2
MonotocityNot monotonic
2020-10-25T20:13:42.989480image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
383300.1%
 
243290.1%
 
823250.1%
 
113230.1%
 
393170.1%
 
253110.1%
 
133060.1%
 
202950.1%
 
62940.1%
 
312930.1%
 
Other values (8499)25365998.8%
 
ValueCountFrequency (%) 
12350.1%
 
22580.1%
 
32600.1%
 
42610.1%
 
52640.1%
 
ValueCountFrequency (%) 
34422625< 0.1%
 
34061619< 0.1%
 
33445823< 0.1%
 
32377420< 0.1%
 
30285617< 0.1%
 

AANTAL_PAT_PER_SPC
Real number (ℝ≥0)

HIGH CORRELATION

Distinct242
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean658474.8873
Minimum488
Maximum1489520
Zeros0
Zeros (%)0.0%
Memory size2.0 MiB
2020-10-25T20:13:43.232008image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum488
5-th percentile43687
Q1248773
median747052
Q31006376
95-th percentile1345302
Maximum1489520
Range1489032
Interquartile range (IQR)757603

Descriptive statistics

Standard deviation423038.6495
Coefficient of variation (CV)0.6424522145
Kurtosis-1.169335434
Mean658474.8873
Median Absolute Deviation (MAD)322609
Skewness0.07813893512
Sum1.690844985e+11
Variance1.78961699e+11
MonotocityNot monotonic
2020-10-25T20:13:43.460902image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
88096951022.0%
 
87427643551.7%
 
84400043481.7%
 
89288543321.7%
 
87298942691.7%
 
82550641251.6%
 
108429238911.5%
 
106399238511.5%
 
107629938461.5%
 
103902338101.5%
 
Other values (232)21485383.7%
 
ValueCountFrequency (%) 
48851< 0.1%
 
1294120< 0.1%
 
19491310.1%
 
25841730.1%
 
566215< 0.1%
 
ValueCountFrequency (%) 
148952029761.2%
 
145064130541.2%
 
142188035641.4%
 
134530235431.4%
 
133314635471.4%
 

AANTAL_SUBTRAJECT_PER_SPC
Real number (ℝ≥0)

HIGH CORRELATION

Distinct242
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1039156.657
Minimum510
Maximum2578269
Zeros0
Zeros (%)0.0%
Memory size2.0 MiB
2020-10-25T20:13:43.704324image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum510
5-th percentile47358
Q1355194
median989615
Q31729145
95-th percentile2389572
Maximum2578269
Range2577759
Interquartile range (IQR)1373951

Descriptive statistics

Standard deviation729408.9524
Coefficient of variation (CV)0.7019239571
Kurtosis-0.9481651724
Mean1039156.657
Median Absolute Deviation (MAD)663885
Skewness0.3372764253
Sum2.668367246e+11
Variance5.320374198e+11
MonotocityNot monotonic
2020-10-25T20:13:43.937608image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
121180851022.0%
 
128156443551.7%
 
121628843481.7%
 
131262343321.7%
 
128626442691.7%
 
120990841251.6%
 
255765538911.5%
 
248968938511.5%
 
257826938461.5%
 
206652338101.5%
 
Other values (232)21485383.7%
 
ValueCountFrequency (%) 
51051< 0.1%
 
1494120< 0.1%
 
22311310.1%
 
29241730.1%
 
571115< 0.1%
 
ValueCountFrequency (%) 
257826938461.5%
 
255765538911.5%
 
248968938511.5%
 
238957237771.5%
 
218485137571.5%
 

GEMIDDELDE_VERKOOPPRIJS
Real number (ℝ≥0)

MISSING

Distinct3133
Distinct (%)1.4%
Missing40320
Missing (%)15.7%
Infinite0
Infinite (%)0.0%
Mean3480.855162
Minimum70
Maximum287220
Zeros0
Zeros (%)0.0%
Memory size2.0 MiB
2020-10-25T20:13:44.162965image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum70
5-th percentile140
Q1460
median1215
Q33980
95-th percentile13220
Maximum287220
Range287150
Interquartile range (IQR)3520

Descriptive statistics

Standard deviation6565.924627
Coefficient of variation (CV)1.886296419
Kurtosis174.3252164
Mean3480.855162
Median Absolute Deviation (MAD)990
Skewness7.96652267
Sum753472870
Variance43111366.2
MonotocityNot monotonic
2020-10-25T20:13:44.379867image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
10518300.7%
 
16017800.7%
 
11014610.6%
 
18013610.5%
 
18512680.5%
 
14512600.5%
 
30012400.5%
 
14012030.5%
 
16511810.5%
 
50011480.4%
 
Other values (3123)20273079.0%
 
(Missing)4032015.7%
 
ValueCountFrequency (%) 
702260.1%
 
7575< 0.1%
 
803610.1%
 
859090.4%
 
905020.2%
 
ValueCountFrequency (%) 
2872208< 0.1%
 
1489103< 0.1%
 
1428504< 0.1%
 
1221554< 0.1%
 
1167653< 0.1%
 

Interactions

2020-10-25T20:13:17.441091image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:17.707278image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:17.950773image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:18.211752image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:18.457260image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:18.689559image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:18.945472image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:19.191812image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:19.431278image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:19.676013image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:19.910022image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:20.140947image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:20.380579image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:20.605379image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:20.820179image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:21.055968image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:21.290328image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:21.514244image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:21.742867image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:21.987792image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:22.220384image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:22.461420image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:22.699174image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:22.929580image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:23.172157image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:23.570419image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:23.805421image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:24.036818image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:24.254449image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:24.471720image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:24.706850image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:24.915744image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:25.129943image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:25.346395image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:25.581777image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:25.783046image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:26.006800image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:26.230285image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:26.440941image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:26.662910image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:26.882078image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:27.081431image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:27.299501image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:27.525022image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:27.739863image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:27.960299image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:28.207993image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:28.437564image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:28.678583image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:28.917210image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:29.141092image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:29.376676image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:29.623424image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:30.000958image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:30.239317image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:30.479760image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:30.703728image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:30.943338image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:31.174680image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:31.394045image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:31.629233image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:31.866351image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:32.106064image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:32.344039image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:32.571373image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:32.803002image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:33.042798image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:33.260015image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:33.467393image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:33.689464image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:33.910262image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:34.129160image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:34.345790image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:34.585678image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:34.802201image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:35.030840image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:35.256741image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:35.467352image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:35.699349image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:35.923126image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:36.286666image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2020-10-25T20:13:44.723239image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-10-25T20:13:45.031838image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-10-25T20:13:45.345995image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-10-25T20:13:45.663634image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2020-10-25T20:13:36.838836image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:37.577895image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-25T20:13:38.405011image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Sample

First rows

VERSIEDATUM_BESTANDPEILDATUMJAARBEHANDELEND_SPECIALISME_CDTYPERENDE_DIAGNOSE_CDZORGPRODUCT_CDAANTAL_PAT_PER_ZPDAANTAL_SUBTRAJECT_PER_ZPDAANTAL_PAT_PER_DIAGAANTAL_SUBTRAJECT_PER_DIAGAANTAL_PAT_PER_SPCAANTAL_SUBTRAJECT_PER_SPCGEMIDDELDE_VERKOOPPRIJS
01.02020-10-112020-10-012016-01-013015577040100215154429156007119113619101441425.0
11.02020-10-112020-10-012016-01-01301554704010021259614295200174288400119113619101441425.0
21.02020-10-112020-10-012016-01-0130155970401002586044455485119113619101441425.0
31.02020-10-112020-10-012016-01-0130155470401003979820017428840011911361910144625.0
41.02020-10-112020-10-012016-01-0130155970401003664445548511911361910144625.0
51.02020-10-112020-10-012016-01-01301557704010031010442915600711911361910144625.0
61.02020-10-112020-10-012016-01-013015597040100410104445548511911361910144NaN
71.02020-10-112020-10-012016-01-013015577040100444442915600711911361910144NaN
81.02020-10-112020-10-012016-01-0130155470401004697020017428840011911361910144NaN
91.02020-10-112020-10-012016-01-01301554704010062220017428840011911361910144NaN

Last rows

VERSIEDATUM_BESTANDPEILDATUMJAARBEHANDELEND_SPECIALISME_CDTYPERENDE_DIAGNOSE_CDZORGPRODUCT_CDAANTAL_PAT_PER_ZPDAANTAL_SUBTRAJECT_PER_ZPDAANTAL_PAT_PER_DIAGAANTAL_SUBTRAJECT_PER_DIAGAANTAL_PAT_PER_SPCAANTAL_SUBTRAJECT_PER_SPCGEMIDDELDE_VERKOOPPRIJS
2567721.02020-10-112020-10-012018-01-013163504991630064451722445889758280NaN
2567731.02020-10-112020-10-012018-01-01316350499163006510111722445889758280205.0
2567741.02020-10-112020-10-012018-01-0131635189916300691117872573445889758280NaN
2567751.02020-10-112020-10-012018-01-0131635209916300691133785109445889758280NaN
2567761.02020-10-112020-10-012018-01-0131635209916300701516337851094458897582802660.0
2567771.02020-10-112020-10-012018-01-01316351899163007011178725734458897582802660.0
2567781.02020-10-112020-10-012018-01-01316351799163007011108215524458897582802660.0
2567791.02020-10-112020-10-012018-01-01316352299163007011135118714458897582802660.0
2567801.02020-10-112020-10-012018-01-013167610991630070111141494458897582802660.0
2567811.02020-10-112020-10-012018-01-013163521991630070112873724458897582802660.0